MersV1, Ncbi, Merge, bibRecord, 001755

A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

Identifieur interne : 001755 ( Ncbi/Merge ); précédent : 001754; suivant : 001756

A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

Auteurs : Armen Abnousi [États-Unis] ; Shira L. Broschat [États-Unis] ; Ananth Kalyanaraman [États-Unis]

Source :

PLoS ONE [ 1932-6203 ] ; 2016.

RBID : PMC:4995020

Descripteurs français

KwdFr :
- Algorithmes, Alignement de séquences, Analyse de séquence de protéine, Bases de données de protéines, Domaines protéiques, Logiciel, Similitude de séquences d'acides aminés, Structure tertiaire des protéines, Séquence conservée (génétique).
MESH :
- génétique : Séquence conservée.
- Algorithmes, Alignement de séquences, Analyse de séquence de protéine, Bases de données de protéines, Domaines protéiques, Logiciel, Similitude de séquences d'acides aminés, Structure tertiaire des protéines.

English descriptors

KwdEn :
- Algorithms, Conserved Sequence (genetics), Databases, Protein, Protein Domains, Protein Structure, Tertiary, Sequence Alignment, Sequence Analysis, Protein, Sequence Homology, Amino Acid, Software.
MESH :
- genetics : Conserved Sequence.
- Algorithms, Databases, Protein, Protein Domains, Protein Structure, Tertiary, Sequence Alignment, Sequence Analysis, Protein, Sequence Homology, Amino Acid, Software.

Abstract

Background

Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges.

Methods

In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (k-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable.

Results

We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s for NADDA, ADDA, and MKDOM2, respectively, for a data set comprised of approximately 2500 sequences.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4995020

DOI: 10.1371/journal.pone.0161338
PubMed: 27552220
PubMed Central: 4995020

Links toward previous steps (curation, corpus...)

to stream Pmc, to step Corpus: 001025
to stream Pmc, to step Curation: 001025
to stream Pmc, to step Checkpoint: 000C74
to stream PubMed, to step Corpus: 000F93
to stream PubMed, to step Curation: 000F93
to stream PubMed, to step Checkpoint: 001251

Links to Exploration step

PMC:4995020

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions</title>
<author><name sortKey="Abnousi, Armen" sort="Abnousi, Armen" uniqKey="Abnousi A" first="Armen" last="Abnousi">Armen Abnousi</name>
<affiliation wicri:level="2"><nlm:aff id="aff001"><addr-line>School of EECS, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Broschat, Shira L" sort="Broschat, Shira L" uniqKey="Broschat S" first="Shira L." last="Broschat">Shira L. Broschat</name>
<affiliation wicri:level="2"><nlm:aff id="aff001"><addr-line>School of EECS, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="aff002"><addr-line>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="aff003"><addr-line>Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Kalyanaraman, Ananth" sort="Kalyanaraman, Ananth" uniqKey="Kalyanaraman A" first="Ananth" last="Kalyanaraman">Ananth Kalyanaraman</name>
<affiliation wicri:level="2"><nlm:aff id="aff001"><addr-line>School of EECS, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="aff002"><addr-line>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">27552220</idno>
<idno type="pmc">4995020</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4995020</idno>
<idno type="RBID">PMC:4995020</idno>
<idno type="doi">10.1371/journal.pone.0161338</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">001025</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001025</idno>
<idno type="wicri:Area/Pmc/Curation">001025</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">001025</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000C74</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000C74</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:27552220</idno>
<idno type="wicri:Area/PubMed/Corpus">000F93</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000F93</idno>
<idno type="wicri:Area/PubMed/Curation">000F93</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000F93</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001251</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001251</idno>
<idno type="wicri:Area/Ncbi/Merge">001755</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions</title>
<author><name sortKey="Abnousi, Armen" sort="Abnousi, Armen" uniqKey="Abnousi A" first="Armen" last="Abnousi">Armen Abnousi</name>
<affiliation wicri:level="2"><nlm:aff id="aff001"><addr-line>School of EECS, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Broschat, Shira L" sort="Broschat, Shira L" uniqKey="Broschat S" first="Shira L." last="Broschat">Shira L. Broschat</name>
<affiliation wicri:level="2"><nlm:aff id="aff001"><addr-line>School of EECS, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="aff002"><addr-line>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="aff003"><addr-line>Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Kalyanaraman, Ananth" sort="Kalyanaraman, Ananth" uniqKey="Kalyanaraman A" first="Ananth" last="Kalyanaraman">Ananth Kalyanaraman</name>
<affiliation wicri:level="2"><nlm:aff id="aff001"><addr-line>School of EECS, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="aff002"><addr-line>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint><date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Conserved Sequence (genetics)</term>
<term>Databases, Protein</term>
<term>Protein Domains</term>
<term>Protein Structure, Tertiary</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis, Protein</term>
<term>Sequence Homology, Amino Acid</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Alignement de séquences</term>
<term>Analyse de séquence de protéine</term>
<term>Bases de données de protéines</term>
<term>Domaines protéiques</term>
<term>Logiciel</term>
<term>Similitude de séquences d'acides aminés</term>
<term>Structure tertiaire des protéines</term>
<term>Séquence conservée (génétique)</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Conserved Sequence</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Séquence conservée</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Databases, Protein</term>
<term>Protein Domains</term>
<term>Protein Structure, Tertiary</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis, Protein</term>
<term>Sequence Homology, Amino Acid</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Alignement de séquences</term>
<term>Analyse de séquence de protéine</term>
<term>Bases de données de protéines</term>
<term>Domaines protéiques</term>
<term>Logiciel</term>
<term>Similitude de séquences d'acides aminés</term>
<term>Structure tertiaire des protéines</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec id="sec001"><title>Background</title>
<p>Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges.</p>
</sec>
<sec id="sec002"><title>Methods</title>
<p>In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (<italic>k</italic>
-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable.</p>
</sec>
<sec id="sec003"><title>Results</title>
<p>We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s for NADDA, ADDA, and MKDOM2, respectively, for a data set comprised of approximately 2500 sequences.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Consortium, U" uniqKey="Consortium U">U Consortium</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Doolittle, Rf" uniqKey="Doolittle R">RF Doolittle</name>
</author>
<author><name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Heger, A" uniqKey="Heger A">A Heger</name>
</author>
<author><name sortKey="Holm, L" uniqKey="Holm L">L Holm</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Portugaly, E" uniqKey="Portugaly E">E Portugaly</name>
</author>
<author><name sortKey="Harel, A" uniqKey="Harel A">A Harel</name>
</author>
<author><name sortKey="Linial, N" uniqKey="Linial N">N Linial</name>
</author>
<author><name sortKey="Linial, M" uniqKey="Linial M">M Linial</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gouzy, J" uniqKey="Gouzy J">J Gouzy</name>
</author>
<author><name sortKey="Corpet, F" uniqKey="Corpet F">F Corpet</name>
</author>
<author><name sortKey="Kahn, D" uniqKey="Kahn D">D Kahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Tress, M" uniqKey="Tress M">M Tress</name>
</author>
<author><name sortKey="Cheng, J" uniqKey="Cheng J">J Cheng</name>
</author>
<author><name sortKey="Baldi, P" uniqKey="Baldi P">P Baldi</name>
</author>
<author><name sortKey="Joo, K" uniqKey="Joo K">K Joo</name>
</author>
<author><name sortKey="Lee, J" uniqKey="Lee J">J Lee</name>
</author>
<author><name sortKey="Seo, Jh" uniqKey="Seo J">JH Seo</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tai, Ch" uniqKey="Tai C">CH Tai</name>
</author>
<author><name sortKey="Lee, Wj" uniqKey="Lee W">WJ Lee</name>
</author>
<author><name sortKey="Vincent, Jj" uniqKey="Vincent J">JJ Vincent</name>
</author>
<author><name sortKey="Lee, B" uniqKey="Lee B">B Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Eickholt, J" uniqKey="Eickholt J">J Eickholt</name>
</author>
<author><name sortKey="Deng, X" uniqKey="Deng X">X Deng</name>
</author>
<author><name sortKey="Cheng, J" uniqKey="Cheng J">J Cheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author><name sortKey="Madden, Tl" uniqKey="Madden T">TL Madden</name>
</author>
<author><name sortKey="Sch Ffer, Aa" uniqKey="Sch Ffer A">AA Schäffer</name>
</author>
<author><name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author><name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author><name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gouzy, J" uniqKey="Gouzy J">J Gouzy</name>
</author>
<author><name sortKey="Eugene, P" uniqKey="Eugene P">P Eugene</name>
</author>
<author><name sortKey="Greene, Ea" uniqKey="Greene E">EA Greene</name>
</author>
<author><name sortKey="Kahn, D" uniqKey="Kahn D">D Kahn</name>
</author>
<author><name sortKey="Corpet, F" uniqKey="Corpet F">F Corpet</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sonnhammer, El" uniqKey="Sonnhammer E">EL Sonnhammer</name>
</author>
<author><name sortKey="Kahn, D" uniqKey="Kahn D">D Kahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bru, C" uniqKey="Bru C">C Bru</name>
</author>
<author><name sortKey="Courcelle, E" uniqKey="Courcelle E">E Courcelle</name>
</author>
<author><name sortKey="Carrere, S" uniqKey="Carrere S">S Carrère</name>
</author>
<author><name sortKey="Beausse, Y" uniqKey="Beausse Y">Y Beausse</name>
</author>
<author><name sortKey="Dalmar, S" uniqKey="Dalmar S">S Dalmar</name>
</author>
<author><name sortKey="Kahn, D" uniqKey="Kahn D">D Kahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author><name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author><name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author><name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author><name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Smith, Tf" uniqKey="Smith T">TF Smith</name>
</author>
<author><name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dean, J" uniqKey="Dean J">J Dean</name>
</author>
<author><name sortKey="Ghemawat, S" uniqKey="Ghemawat S">S Ghemawat</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Breiman, L" uniqKey="Breiman L">L Breiman</name>
</author>
<author><name sortKey="Friedman, Jh" uniqKey="Friedman J">JH Friedman</name>
</author>
<author><name sortKey="Olshen, Ra" uniqKey="Olshen R">RA Olshen</name>
</author>
<author><name sortKey="Stone, Cj" uniqKey="Stone C">CJ Stone</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Quilan, J" uniqKey="Quilan J">J Quilan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ho, Tk" uniqKey="Ho T">TK Ho</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Plimpton, Sj" uniqKey="Plimpton S">SJ Plimpton</name>
</author>
<author><name sortKey="Devine, Kd" uniqKey="Devine K">KD Devine</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pedregosa, F" uniqKey="Pedregosa F">F Pedregosa</name>
</author>
<author><name sortKey="Varoquaux, G" uniqKey="Varoquaux G">G Varoquaux</name>
</author>
<author><name sortKey="Gramfort, A" uniqKey="Gramfort A">A Gramfort</name>
</author>
<author><name sortKey="Michel, V" uniqKey="Michel V">V Michel</name>
</author>
<author><name sortKey="Thirion, B" uniqKey="Thirion B">B Thirion</name>
</author>
<author><name sortKey="Grisel, O" uniqKey="Grisel O">O Grisel</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jones, P" uniqKey="Jones P">P Jones</name>
</author>
<author><name sortKey="Binns, D" uniqKey="Binns D">D Binns</name>
</author>
<author><name sortKey="Chang, Hy" uniqKey="Chang H">HY Chang</name>
</author>
<author><name sortKey="Fraser, M" uniqKey="Fraser M">M Fraser</name>
</author>
<author><name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author><name sortKey="Mcanulla, C" uniqKey="Mcanulla C">C McAnulla</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schultz, J" uniqKey="Schultz J">J Schultz</name>
</author>
<author><name sortKey="Milpetz, F" uniqKey="Milpetz F">F Milpetz</name>
</author>
<author><name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
<author><name sortKey="Ponting, Cp" uniqKey="Ponting C">CP Ponting</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Finn, Rd" uniqKey="Finn R">RD Finn</name>
</author>
<author><name sortKey="Coggill, P" uniqKey="Coggill P">P Coggill</name>
</author>
<author><name sortKey="Eberhardt, Ry" uniqKey="Eberhardt R">RY Eberhardt</name>
</author>
<author><name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
<author><name sortKey="Mistry, J" uniqKey="Mistry J">J Mistry</name>
</author>
<author><name sortKey="Mitchell, Al" uniqKey="Mitchell A">AL Mitchell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sigrist, Cj" uniqKey="Sigrist C">CJ Sigrist</name>
</author>
<author><name sortKey="De Castro, E" uniqKey="De Castro E">E De Castro</name>
</author>
<author><name sortKey="Cerutti, L" uniqKey="Cerutti L">L Cerutti</name>
</author>
<author><name sortKey="Cuche, Ba" uniqKey="Cuche B">BA Cuche</name>
</author>
<author><name sortKey="Hulo, N" uniqKey="Hulo N">N Hulo</name>
</author>
<author><name sortKey="Bridge, A" uniqKey="Bridge A">A Bridge</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Haft, Dh" uniqKey="Haft D">DH Haft</name>
</author>
<author><name sortKey="Selengut, Jd" uniqKey="Selengut J">JD Selengut</name>
</author>
<author><name sortKey="Richter, Ra" uniqKey="Richter R">RA Richter</name>
</author>
<author><name sortKey="Harkins, D" uniqKey="Harkins D">D Harkins</name>
</author>
<author><name sortKey="Basu, Mk" uniqKey="Basu M">MK Basu</name>
</author>
<author><name sortKey="Beck, E" uniqKey="Beck E">E Beck</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Srivastava, A" uniqKey="Srivastava A">A Srivastava</name>
</author>
<author><name sortKey="Han, Eh" uniqKey="Han E">EH Han</name>
</author>
<author><name sortKey="Kumar, V" uniqKey="Kumar V">V Kumar</name>
</author>
<author><name sortKey="Singh, V" uniqKey="Singh V">V Singh</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<double pmid="27552220"><pmc><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions</title>
<author><name sortKey="Abnousi, Armen" sort="Abnousi, Armen" uniqKey="Abnousi A" first="Armen" last="Abnousi">Armen Abnousi</name>
<affiliation wicri:level="2"><nlm:aff id="aff001"><addr-line>School of EECS, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Broschat, Shira L" sort="Broschat, Shira L" uniqKey="Broschat S" first="Shira L." last="Broschat">Shira L. Broschat</name>
<affiliation wicri:level="2"><nlm:aff id="aff001"><addr-line>School of EECS, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="aff002"><addr-line>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="aff003"><addr-line>Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Kalyanaraman, Ananth" sort="Kalyanaraman, Ananth" uniqKey="Kalyanaraman A" first="Ananth" last="Kalyanaraman">Ananth Kalyanaraman</name>
<affiliation wicri:level="2"><nlm:aff id="aff001"><addr-line>School of EECS, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="aff002"><addr-line>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">27552220</idno>
<idno type="pmc">4995020</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4995020</idno>
<idno type="RBID">PMC:4995020</idno>
<idno type="doi">10.1371/journal.pone.0161338</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">001025</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">001025</idno>
<idno type="wicri:Area/Pmc/Curation">001025</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">001025</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000C74</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000C74</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions</title>
<author><name sortKey="Abnousi, Armen" sort="Abnousi, Armen" uniqKey="Abnousi A" first="Armen" last="Abnousi">Armen Abnousi</name>
<affiliation wicri:level="2"><nlm:aff id="aff001"><addr-line>School of EECS, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Broschat, Shira L" sort="Broschat, Shira L" uniqKey="Broschat S" first="Shira L." last="Broschat">Shira L. Broschat</name>
<affiliation wicri:level="2"><nlm:aff id="aff001"><addr-line>School of EECS, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="aff002"><addr-line>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="aff003"><addr-line>Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Kalyanaraman, Ananth" sort="Kalyanaraman, Ananth" uniqKey="Kalyanaraman A" first="Ananth" last="Kalyanaraman">Ananth Kalyanaraman</name>
<affiliation wicri:level="2"><nlm:aff id="aff001"><addr-line>School of EECS, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
<affiliation wicri:level="2"><nlm:aff id="aff002"><addr-line>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States of America</addr-line>
</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">PLoS ONE</title>
<idno type="eISSN">1932-6203</idno>
<imprint><date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec id="sec001"><title>Background</title>
<p>Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges.</p>
</sec>
<sec id="sec002"><title>Methods</title>
<p>In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (<italic>k</italic>
-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable.</p>
</sec>
<sec id="sec003"><title>Results</title>
<p>We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s for NADDA, ADDA, and MKDOM2, respectively, for a data set comprised of approximately 2500 sequences.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Consortium, U" uniqKey="Consortium U">U Consortium</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Doolittle, Rf" uniqKey="Doolittle R">RF Doolittle</name>
</author>
<author><name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Heger, A" uniqKey="Heger A">A Heger</name>
</author>
<author><name sortKey="Holm, L" uniqKey="Holm L">L Holm</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Portugaly, E" uniqKey="Portugaly E">E Portugaly</name>
</author>
<author><name sortKey="Harel, A" uniqKey="Harel A">A Harel</name>
</author>
<author><name sortKey="Linial, N" uniqKey="Linial N">N Linial</name>
</author>
<author><name sortKey="Linial, M" uniqKey="Linial M">M Linial</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gouzy, J" uniqKey="Gouzy J">J Gouzy</name>
</author>
<author><name sortKey="Corpet, F" uniqKey="Corpet F">F Corpet</name>
</author>
<author><name sortKey="Kahn, D" uniqKey="Kahn D">D Kahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Tress, M" uniqKey="Tress M">M Tress</name>
</author>
<author><name sortKey="Cheng, J" uniqKey="Cheng J">J Cheng</name>
</author>
<author><name sortKey="Baldi, P" uniqKey="Baldi P">P Baldi</name>
</author>
<author><name sortKey="Joo, K" uniqKey="Joo K">K Joo</name>
</author>
<author><name sortKey="Lee, J" uniqKey="Lee J">J Lee</name>
</author>
<author><name sortKey="Seo, Jh" uniqKey="Seo J">JH Seo</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Tai, Ch" uniqKey="Tai C">CH Tai</name>
</author>
<author><name sortKey="Lee, Wj" uniqKey="Lee W">WJ Lee</name>
</author>
<author><name sortKey="Vincent, Jj" uniqKey="Vincent J">JJ Vincent</name>
</author>
<author><name sortKey="Lee, B" uniqKey="Lee B">B Lee</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Eickholt, J" uniqKey="Eickholt J">J Eickholt</name>
</author>
<author><name sortKey="Deng, X" uniqKey="Deng X">X Deng</name>
</author>
<author><name sortKey="Cheng, J" uniqKey="Cheng J">J Cheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author><name sortKey="Madden, Tl" uniqKey="Madden T">TL Madden</name>
</author>
<author><name sortKey="Sch Ffer, Aa" uniqKey="Sch Ffer A">AA Schäffer</name>
</author>
<author><name sortKey="Zhang, J" uniqKey="Zhang J">J Zhang</name>
</author>
<author><name sortKey="Zhang, Z" uniqKey="Zhang Z">Z Zhang</name>
</author>
<author><name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gouzy, J" uniqKey="Gouzy J">J Gouzy</name>
</author>
<author><name sortKey="Eugene, P" uniqKey="Eugene P">P Eugene</name>
</author>
<author><name sortKey="Greene, Ea" uniqKey="Greene E">EA Greene</name>
</author>
<author><name sortKey="Kahn, D" uniqKey="Kahn D">D Kahn</name>
</author>
<author><name sortKey="Corpet, F" uniqKey="Corpet F">F Corpet</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sonnhammer, El" uniqKey="Sonnhammer E">EL Sonnhammer</name>
</author>
<author><name sortKey="Kahn, D" uniqKey="Kahn D">D Kahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bru, C" uniqKey="Bru C">C Bru</name>
</author>
<author><name sortKey="Courcelle, E" uniqKey="Courcelle E">E Courcelle</name>
</author>
<author><name sortKey="Carrere, S" uniqKey="Carrere S">S Carrère</name>
</author>
<author><name sortKey="Beausse, Y" uniqKey="Beausse Y">Y Beausse</name>
</author>
<author><name sortKey="Dalmar, S" uniqKey="Dalmar S">S Dalmar</name>
</author>
<author><name sortKey="Kahn, D" uniqKey="Kahn D">D Kahn</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author><name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author><name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author><name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author><name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Smith, Tf" uniqKey="Smith T">TF Smith</name>
</author>
<author><name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Dean, J" uniqKey="Dean J">J Dean</name>
</author>
<author><name sortKey="Ghemawat, S" uniqKey="Ghemawat S">S Ghemawat</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Breiman, L" uniqKey="Breiman L">L Breiman</name>
</author>
<author><name sortKey="Friedman, Jh" uniqKey="Friedman J">JH Friedman</name>
</author>
<author><name sortKey="Olshen, Ra" uniqKey="Olshen R">RA Olshen</name>
</author>
<author><name sortKey="Stone, Cj" uniqKey="Stone C">CJ Stone</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Quilan, J" uniqKey="Quilan J">J Quilan</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ho, Tk" uniqKey="Ho T">TK Ho</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Plimpton, Sj" uniqKey="Plimpton S">SJ Plimpton</name>
</author>
<author><name sortKey="Devine, Kd" uniqKey="Devine K">KD Devine</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pedregosa, F" uniqKey="Pedregosa F">F Pedregosa</name>
</author>
<author><name sortKey="Varoquaux, G" uniqKey="Varoquaux G">G Varoquaux</name>
</author>
<author><name sortKey="Gramfort, A" uniqKey="Gramfort A">A Gramfort</name>
</author>
<author><name sortKey="Michel, V" uniqKey="Michel V">V Michel</name>
</author>
<author><name sortKey="Thirion, B" uniqKey="Thirion B">B Thirion</name>
</author>
<author><name sortKey="Grisel, O" uniqKey="Grisel O">O Grisel</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Jones, P" uniqKey="Jones P">P Jones</name>
</author>
<author><name sortKey="Binns, D" uniqKey="Binns D">D Binns</name>
</author>
<author><name sortKey="Chang, Hy" uniqKey="Chang H">HY Chang</name>
</author>
<author><name sortKey="Fraser, M" uniqKey="Fraser M">M Fraser</name>
</author>
<author><name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
<author><name sortKey="Mcanulla, C" uniqKey="Mcanulla C">C McAnulla</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Schultz, J" uniqKey="Schultz J">J Schultz</name>
</author>
<author><name sortKey="Milpetz, F" uniqKey="Milpetz F">F Milpetz</name>
</author>
<author><name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
<author><name sortKey="Ponting, Cp" uniqKey="Ponting C">CP Ponting</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Finn, Rd" uniqKey="Finn R">RD Finn</name>
</author>
<author><name sortKey="Coggill, P" uniqKey="Coggill P">P Coggill</name>
</author>
<author><name sortKey="Eberhardt, Ry" uniqKey="Eberhardt R">RY Eberhardt</name>
</author>
<author><name sortKey="Eddy, Sr" uniqKey="Eddy S">SR Eddy</name>
</author>
<author><name sortKey="Mistry, J" uniqKey="Mistry J">J Mistry</name>
</author>
<author><name sortKey="Mitchell, Al" uniqKey="Mitchell A">AL Mitchell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sigrist, Cj" uniqKey="Sigrist C">CJ Sigrist</name>
</author>
<author><name sortKey="De Castro, E" uniqKey="De Castro E">E De Castro</name>
</author>
<author><name sortKey="Cerutti, L" uniqKey="Cerutti L">L Cerutti</name>
</author>
<author><name sortKey="Cuche, Ba" uniqKey="Cuche B">BA Cuche</name>
</author>
<author><name sortKey="Hulo, N" uniqKey="Hulo N">N Hulo</name>
</author>
<author><name sortKey="Bridge, A" uniqKey="Bridge A">A Bridge</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Haft, Dh" uniqKey="Haft D">DH Haft</name>
</author>
<author><name sortKey="Selengut, Jd" uniqKey="Selengut J">JD Selengut</name>
</author>
<author><name sortKey="Richter, Ra" uniqKey="Richter R">RA Richter</name>
</author>
<author><name sortKey="Harkins, D" uniqKey="Harkins D">D Harkins</name>
</author>
<author><name sortKey="Basu, Mk" uniqKey="Basu M">MK Basu</name>
</author>
<author><name sortKey="Beck, E" uniqKey="Beck E">E Beck</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Srivastava, A" uniqKey="Srivastava A">A Srivastava</name>
</author>
<author><name sortKey="Han, Eh" uniqKey="Han E">EH Han</name>
</author>
<author><name sortKey="Kumar, V" uniqKey="Kumar V">V Kumar</name>
</author>
<author><name sortKey="Singh, V" uniqKey="Singh V">V Singh</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
</pmc>
<pubmed><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions.</title>
<author><name sortKey="Abnousi, Armen" sort="Abnousi, Armen" uniqKey="Abnousi A" first="Armen" last="Abnousi">Armen Abnousi</name>
<affiliation wicri:level="2"><nlm:affiliation>School of EECS, Washington State University, Pullman, WA, United States of America.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Broschat, Shira L" sort="Broschat, Shira L" uniqKey="Broschat S" first="Shira L" last="Broschat">Shira L. Broschat</name>
<affiliation wicri:level="2"><nlm:affiliation>School of EECS, Washington State University, Pullman, WA, United States of America.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Kalyanaraman, Ananth" sort="Kalyanaraman, Ananth" uniqKey="Kalyanaraman A" first="Ananth" last="Kalyanaraman">Ananth Kalyanaraman</name>
<affiliation wicri:level="2"><nlm:affiliation>School of EECS, Washington State University, Pullman, WA, United States of America.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2016">2016</date>
<idno type="RBID">pubmed:27552220</idno>
<idno type="pmid">27552220</idno>
<idno type="doi">10.1371/journal.pone.0161338</idno>
<idno type="wicri:Area/PubMed/Corpus">000F93</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000F93</idno>
<idno type="wicri:Area/PubMed/Curation">000F93</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000F93</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001251</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001251</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions.</title>
<author><name sortKey="Abnousi, Armen" sort="Abnousi, Armen" uniqKey="Abnousi A" first="Armen" last="Abnousi">Armen Abnousi</name>
<affiliation wicri:level="2"><nlm:affiliation>School of EECS, Washington State University, Pullman, WA, United States of America.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Broschat, Shira L" sort="Broschat, Shira L" uniqKey="Broschat S" first="Shira L" last="Broschat">Shira L. Broschat</name>
<affiliation wicri:level="2"><nlm:affiliation>School of EECS, Washington State University, Pullman, WA, United States of America.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Kalyanaraman, Ananth" sort="Kalyanaraman, Ananth" uniqKey="Kalyanaraman A" first="Ananth" last="Kalyanaraman">Ananth Kalyanaraman</name>
<affiliation wicri:level="2"><nlm:affiliation>School of EECS, Washington State University, Pullman, WA, United States of America.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>School of EECS, Washington State University, Pullman, WA</wicri:regionArea>
<placeName><region type="state">Washington (État)</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j">PloS one</title>
<idno type="eISSN">1932-6203</idno>
<imprint><date when="2016" type="published">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Conserved Sequence (genetics)</term>
<term>Databases, Protein</term>
<term>Protein Domains</term>
<term>Protein Structure, Tertiary</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis, Protein</term>
<term>Sequence Homology, Amino Acid</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Alignement de séquences</term>
<term>Analyse de séquence de protéine</term>
<term>Bases de données de protéines</term>
<term>Domaines protéiques</term>
<term>Logiciel</term>
<term>Similitude de séquences d'acides aminés</term>
<term>Structure tertiaire des protéines</term>
<term>Séquence conservée (génétique)</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Conserved Sequence</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Séquence conservée</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Databases, Protein</term>
<term>Protein Domains</term>
<term>Protein Structure, Tertiary</term>
<term>Sequence Alignment</term>
<term>Sequence Analysis, Protein</term>
<term>Sequence Homology, Amino Acid</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Alignement de séquences</term>
<term>Analyse de séquence de protéine</term>
<term>Bases de données de protéines</term>
<term>Domaines protéiques</term>
<term>Logiciel</term>
<term>Similitude de séquences d'acides aminés</term>
<term>Structure tertiaire des protéines</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges.</div>
</front>
</TEI>
</pubmed>
</double>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Ncbi/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001755 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 001755 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:4995020
   |texte=   A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:27552220" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration MERS

A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki